Extract Me If You Can: Abusing PDF Parsers in Malware Detectors
نویسندگان
چکیده
Owing to the popularity of the PDF format and the continued exploitation of Adobe Reader, the detection of malicious PDFs remains a concern. All existing detection techniques rely on the PDF parser to a certain extent, while the complexity of the PDF format leaves an abundant space for parser confusion. To quantify the difference between these parsers and Adobe Reader, we create a reference JavaScript extractor by directly tapping into Adobe Reader at locations identified through a mostly automatic binary analysis technique. By comparing the output of this reference extractor against that of several opensource JavaScript extractors on a large data set obtained from VirusTotal, we are able to identify hundreds of samples which existing extractors fail to extract JavaScript from. By analyzing these samples we are able to identify several weaknesses in each of these extractors. Based on these lessons, we apply several obfuscations on a malicious PDF sample, which can successfully evade all the malware detectors tested. We call this evasion technique a PDF parser confusion attack. Lastly, we demonstrate that the reference JavaScript extractor improves the accuracy of existing JavaScript-based classifiers and how it can be used to mitigate these parser limitations in a real-world setting.
منابع مشابه
Unsupervised Anomaly-Based Malware Detection Using Hardware Features
Recent works have shown promise in using microarchitectural execution patterns to detect malware programs. These detectors belong to a class of detectors known as signaturebased detectors as they catch malware by comparing a program’s execution pattern (signature) to execution patterns of known malware programs. In this work, we propose a new class of detectors — anomaly-based hardware malware ...
متن کاملHardening Classifiers against Evasion: the Good, the Bad, and the Ugly
Machine learning is widely used in security applications, particularly in the form of statistical classification aimed at distinguishing benign from malicious entities. Recent research has shown that such classifiers are often vulnerable to evasion attacks, whereby adversaries change behavior to be categorized as benign while preserving malicious functionality. Research into evasion attacks has...
متن کاملAdvanced Persistent Threat: Malicious Code Hidden in PDF Documents
Advanced Persistent Threat (APT) in recent years has become a very popular choice to steal information of specific targets using the vulnerabilities on the targets’ machine. APT involves a set of complex phases, which are difficult to detect and often initiated with spear phishing in the early stage of invasion. To help defend against APT, it is important to study the malformed Portable Documen...
متن کاملCorrecting Proofs via PDF Commenting
The “paperless office” often works better in theory than the real world, but it is becoming feasible to mark text corrections electronically. The free Adobe Reader provides a convenient means for doing so via “PDF Commenting” (also known as “Acrobat Commenting”) once this capability has been enabled for a given PDF. The basic commenting process is quite simple. You open the enabled PDF with Ado...
متن کاملMalware Normalization
Malware is code designed for a malicious purpose, such as obtaining root privilege on a host. A malware detector identifies malware and thus prevents it from adversely affecting a host. In order to evade detection by malware detectors, malware writers use various obfuscation techniques to transform their malware. There is strong evidence that commercial malware detectors are susceptible to thes...
متن کامل